DEDTI versus IEDTI: efficient and predictive models of drug-target interactions

Drug repurposing is an active area of research that aims to decrease the cost and time of drug development. Most of those efforts are primarily concerned with the prediction of drug-target interactions. Many evaluation models, from matrix factorization to more cutting-edge deep neural networks, have come to the scene to identify such relations. Some predictive models are devoted to the prediction’s quality, and others are devoted to the efficiency of the predictive models, e.g., embedding generation. In this work, we propose new representations of drugs and targets useful for more prediction and analysis. Using these representations, we propose two inductive, deep network models of IEDTI and DEDTI for drug-target interaction prediction. Both of them use the accumulation of new representations. The IEDTI takes advantage of triplet and maps the input accumulated similarity features into meaningful embedding corresponding vectors. Then, it applies a deep predictive model to each drug-target pair to evaluate their interaction. The DEDTI directly uses the accumulated similarity feature vectors of drugs and targets and applies a predictive model on each pair to identify their interactions. We have done a comprehensive simulation on the DTINet dataset as well as gold standard datasets, and the results show that DEDTI outperforms IEDTI and the state-of-the-art models. In addition, we conduct a docking study on new predicted interactions between two drug-target pairs, and the results confirm acceptable drug-target binding affinity between both predicted pairs.

De novo drug discovery consumes enormous amounts of money and requires a lengthy investigation with no guarantee of success 1 . To overcome these challenges, computational drug discovery methods are increasingly used to identify unknown and hidden drug-target interactions (DTIs) to treat numerous diseases. Computational drug repurposing is a milestone in identifying novel indications for currently marketed drugs against targets of interest. The main idea behind the computational drug repurposing strategies is based on the fact that similar compounds may share similar properties (known as guilt-by-association) 2,3 . Three main approaches exist to perform computational DTIs prediction 4 . The ligand-based approach is the first one and is used when limited information on the target is available. These approaches rely on the concept that similar compounds have similar properties and interact with similar proteins. In other words, the predicted outputs of these approaches completely depend on the number of known ligands per protein, therefore, their reliability may be affected by an insufficient ratio of ligands per protein [5][6][7][8][9] . The second approach is the docking-based approach, which uses the 3D structures of a ligand and a receptor to evaluate the binding affinity between them 10 . The molecular docking approach suffers from the lack of enough 3D structures of ligands and receptors 11 . The third promising approach, the chemogenomic approach, has been defined as the identification and description of all possible molecules that can interact with any therapeutic targets, therefore, enables researchers to address the issue of predicting off-target proteins for therapeutic candidates 12,13 . This approach try to avoid the drawbacks of the aforementioned methods by finding the correlations between the chemical space of ligand and the genomic space of protein 14 . Chemogenomic approaches can be classified into five types: (1) Neighborhood models, (2) Bipartite local models, (3) Network diffusion models, (4) Matrix factorization models, and (5) Feature-based classification models 4 . Matrix factorization is one of the popularly used methods in DTI prediction 15 . Matrix factorization methods 16 manipulate the DTIs and try to find a latent representation of each drug and each target [16][17][18] . Despite the many advantages of this method, matrix factorization suffers from several disadvantages. For example, matrix factorization uses the linear inner product of two vectors. Consequently, it is not the best solution to predict the interaction or relation of drug and target. As a result, we suggest avoiding conventional linear matrix factorization in drug repurposing. The authors mentioned the problems of matrix factorization methods in another work 19 .
In the last few years, chemogenomic methods that utilize machine learning to predict DTIs (e.g. deep, transformer, and graph neural network methods) have become widely used. These methods have come to the scene to evade the drawbacks of other DTI prediction approaches. We introduce some of the state-of-the-art chemogenomic methods. NeoDTI 20 is a graph neural network-based method that utilizes an inductive matrix completion method to predict the DTIs. AutoDTI++ 21 employs an auto-encoder solution in combination with matrix factorization. Because of using matrix factorization, this method suffers from data leakage. HIDTI 22 generates embeddings of targets and drugs by applying neural networks to their different properties and then concatenates all of them. The concatenation of the processed information of each drug-target pair is fed to a residual neural network to identify their interaction. This method suffers from sparsity as well as incomplete generation of embeddings. MolTrans 23 belongs to transformer-based methods which borrow concepts from the deep language models. TransDTI 24 takes advantage of AlphaFold 25 among other pre-trained embeddings and feeds them to a feed-forward neural network to identify DTIs. This paper proposes two scenarios for predicting DTIs using a Deep Neural Network (DNN). They vary mainly in the way of modeling the input drug-target pair. We call the first scenario "indirect embedding DTI" or simply IEDTI and the second one "direct embedding DTI", or DEDTI. Figures 1 and 2 show proposed frameworks, respectively. We use heterogeneous information, including drug-target interactions, drug-drug interactions, drug-side effect associations, drug-disease associations, target-target interactions, target-disease interactions, and similarities of targets, to predict the DTIs. The "Method" section provides a detailed expression of them.

Methods
IEDTI and DEDTI pipelines. IEDTI and DEDTI use drug-target interactions as labels and the remaining information as input to their models. As shown in Fig. 1, the IEDTI has three steps. The first step, pre-processing, involves reading the drug and target matrices and creating their corresponding feature vectors. For drugs, we have two matrices of drug-drug interactions and drug structural similarities. In addition, there are two more matrices of drug-disease and drug-side effect associations. The pre-processing step uses cosine similarity and converts the two latter matrices into similarity matrices. As a result, drugs have four equal-size matrices. We sum them up in the pre-processing step and generate one feature space for drugs. Then, we aim to convert the original feature space to a lower dimensional space. However, the new space needs to preserve the similarities among feature vectors of the original space. To do this, triplet loss is implemented to do meaningful dimension reduction. Triplet loss needs labels of the correlated feature vectors. The original data space does not have any labels. Therefore, the framework applies k-means to the drug vectors, and similar drugs receive the same labels. In other words, we use k-means for sample labeling. This labeling is crucial to prepare the embedding vectors. The same procedure happens for the targets in the pre-processing step.
In the next step-Embedding generation-IEDTI uses two deep network modules ( DNN 1 and DNN 2 ) for drugs and targets correspondingly. Using DNN 1 , it maps each drug feature vector into an embedding space. These new representations must have a meaningful interpretation of similar drugs with similar embeddings. The same happens for targets with DNN 2 .
The last step of IEDTI-DTI prediction-predicts the interaction between each drug-target pair. On the other hand, DEDTI concentrates on DTI prediction exclusively. DEDTI consists of two steps "pre-processing" and "DTI prediction". It is different from IEDTI by excluding the embedding generation step. We discuss them in more detail as follows.
Datasets. The datasets were obtained from a previous study on predicting non-homogeneous DTIs 11 (we call it DTINet dataset). This dataset contains data of 708 drugs from DrugBank (Version 3.0) 26 , 1512 target proteins from the HPRD database (Release 9) 27 , 5603 diseases from the Comparative Toxicogenomics database 28 , and 4192 drug side effects from the SIDER database (Version 2) 29 . Also, there are 1923 known interactions among drugs and targets 30 .
Moreover, we conducted an external validation of Gold standard datasets of Enzyme, GPCR, Ion Channel, and Nuclear Receptor 31 . Table 1 presents all the datasets' statistics. Data pre-processing. As mentioned, this study evaluates two scenarios for predicting drug-target interactions-the difference between these two scenarios is rooted in different data pre-processing and manipulation stages. Before diving into the scenarios, we first state the data handling in the datasets. Due to the aim of DTIs' prediction, this paper addresses interactions among drugs and targets. Eight matrices contain all the information and interaction necessary for our DTI prediction.
• X, or Drug-Target interactions with dimension 708 × 1512 [some studies consider another matrix called Target-Drug matrix. This latter is nothing but the transposition of the former. This paper uses drug-target interaction as the prediction labels, and, therefore, we need just one of them.]. • D (1) , or Drug-Drug structural similarities with dimension 708 × 708.
• D (2) , or Drug-Drug interactions with dimension 708 × 708. • D (3) raw , or Drug-Disease associations with dimension 708 × 5603. • D (4) raw , or Drug-Side effect associations with dimension 708 × 4192. • T (1) , or Target-Target interactions with dimension 1512 × 1512. IEDTI's Framework. It consists of three steps Pre-processing, Embedding generation, and DTI prediction. (I) The first step reads the drug and target matrices. It converts the drug-side effect, drug-disease, and target-disease associations into three similarity matrices. This procedure leads to having four equal-size matrices for drugs and three equal-size matrices for targets. The framework sums up the drug matrices together and sums up the three target matrices as well. It applies k-means to set the same labels for similar drugs. To visualize it, each label is shown in a different color. The same happens for the targets. (II) The framework uses triplet to generate embedding vectors for each drug and target using two DNN modules. (III) It concatenates the embeddings of each drug-target pair and feeds them to the third DNN module to predict interactions. (I) In the pre-processing step, the framework reads the drug and target matrices. It converts the drug-side effect associations, drug-disease associations, and target-disease associations into three similarity matrices. This procedure leads to having four equal-size matrices for drugs and three equal-size matrices for targets. The framework sums up the drug matrices together and sums up the three target matrices as well. (II) The framework concatenates each drug-target pair. It then feeds the concatenations to a deep network module to predict their interactions.  It is worth to mention we differentiate the first matrix, X, from all other matrices. While we view the other matrices as the input features, X is addressed as the DTIs' prediction labels. The first scenario, Scenario 1, deals with embedding generation in addition to the DTI prediction. The second scenario, Scenario 2, is exclusively concerned with interaction prediction. In other words, while the former deals with embeddings for further analysis, the latter deals with prediction quality. It is worth mentioning that both methods have the same step of pre-processing.
These two scenarios have a single common step of data pre-processing. Both aim to combine information from drug (and proteins) matrices into a single matrix. The first step transforms the matrices for drugs-D (i) , 1 ≤ i ≤ 4-to a single feature matrix, D, and for targets-T (j) , 1 ≤ j ≤ 3 -into a single feature matrix, T. D (1) and D (2) , both, have the same size of 708 × 708 . To generate the drugs' feature space, we convert the two other D (3) and D (4) into a space with a size equal to D (1) and D (2) . In other words, we get rid of the explicit representation of diseases and side effects from D (3) and D (4) , respectively. We produced the similarity matrices of drug-disease, drug-side effect, and target-disease matrices by the "cosine similarity 32 " metric. This type of similarity has been used due to it is scale invariance, directionality awareness, utilization in the recommender systems, and computationally efficient 33,34 .
Assume O is a matrix with the size of o 1 × o 2 . The goal is to compute similarity among its rows. With this aim, we apply the cosine similarity. Its output is a square matrix R with the size of o 1 × o 2 . Thus, the similarity of rows k and ℓ, 1 ≤ k, ℓ ≤ o 1 , R kℓ and is equal to where " · " represents inner product of two vectors and || · || shows the vector's ℓ 2 -norm. Equation 1 is applied on all pairs (k, ℓ), 1 ≤ k, ℓ ≤ o 1 . The resulting matrix R has the size of o 1 × o 1 . D (1) , D (2) , T (1) , and T (2) are already similarity matrices. Thus we apply Equation 1 on the remaining matrices-D Eventually, there are four similarity matrices of drugs D (1) , D (2) , D (3) , and D (4) with the same size of 708 × 708 , and there are three similarity matrices T (1) , T (2) , and T (3) for target data which the size of them is 1512 × 1512 . These conversions aim to generate feature vectors for drugs as well as targets. We do this by summing up the drug similarity matrices for drugs and target similarity matrices for targets. Thus, the final drug and target similarity matrices (D and T) are obtained by summation of similarity matrices as follows.
we consider D and T as the feature vectors for drugs and targets, respectively. In other words, each row of D corresponds to an informative representation of a specific drug. The same applies to the target feature vector T. By having D and T, we can describe the scenarios.

Formulation of problem. This subsection provides the mathematical formulation of IEDTI and DEDTI.
IEDTI formulation. This scenario aims to produce embeddings and DTI prediction utilizing the input feature vectors D and T. It generates an embedding for each drug d i = D(i, :); 1 ≤ i ≤ m and each target t j = T(j, :); 1 ≤ j ≤ n . The embeddings of d i and t j are d i and t j , respectively. These new representations occupy smaller spaces, leading to faster and more efficient computation. In addition, they have meaning, i.e., similar vectors have similar embedding representations, and different ones have dissimilar representations. Then, it predicts the DTIs. We first explain the way of embedding generation. We start with describing the production of drugs' embeddings. Each drug d i of the D matrix is mapped into a new representation space and is shown by d i . In other words, those drugs are transformed into a new domain by meeting the "significant property" of similar pair of  www.nature.com/scientificreports/ vectors having similar pair of embedding vectors and vice versa. Thus, we look for a function, i.e., g 1 , where it converts each d i of D to an embedding vector with the property of similar ones must have similar embedding vectors and dissimilar ones must have dissimilar embedding, or formally: where τ D ∈ R + and τD ∈ R + are comparison thresholds for drugs' original representations and embedded representations, respectively. It is worth noting that used to measure the similarity among vectors in D and their embedding vectors. Distance function can be any legitimate function that discriminates dissimilar vectors and lumps similar vectors in the embedding representation coordinate. The same condition applies to the members of the target similarity matrix (T). So, we look for a function g 2 with similar conditions on t j , or formally: where τ T ∈ R + and τT ∈ R + are comparison thresholds for targets' original representations t i ∈ R n and embedded representations t i ∈ R f 2 , where f 2 ≪ n , respectively. Each row, d i and t j are embedding vectors in a new domain of its corresponding rows, d i and t j , in target and drug similarity matrices, respectively. Similar to dist D and distD , two other functions dist T and distT are respectively R n × R n → R + and R f 2 × R f 2 → R + functions which are used to measure the similarity among vectors in T and their embedding vectors. The d i , 1 ≤ i ≤ m and t j , 1 ≤ j ≤ n are the first type of Scenario 1 output. The next type is the prediction of interaction between drugtarget pairs. To do this, it uses each pair of d i and t j , and calls a function g 3 : We formally define it as follows: Notably, the above explanations are the conceptual formalization of our proposal. The parameters τ D and τ T are handled using clustering and DNN modules. In other words, we will address these three goals with a DNN solution. Our proposed DNN is formed of three modules ( DNN 1 , DNN 2 , DNN 3 ), and each of them models one of the functions {g 1 , g 2 , g 3 } . The first module ( DNN 1 ) is to compute the embedding of the drug similarity vectors (D). Its input vectors are the rows ( d i ) of D, and its output is the new representation of each row, d i . The second module ( DNN 2 ) is for acquiring the target embedding vectors ( t j ). Its input vectors are from the rows ( t j ) of the target similarity matrix. These two DNN modules act as triplet methods. Finally, the third module ( DNN 3 ), by having the inputs in the form of concatenated vectors (d i ,t j ) , predicts the interactions between entities of D and T matrices. The next section provides the structure of the designed DNN in more detail.
DEDTI formulation. This scenario directly focuses on DTIs prediction. To do this, Scenario 2 consists of two steps. The first step is to define the feature vector necessary for DTIs prediction. It utilizes the vectors of D and T to generate the feature vector required for the prediction. In other words, each feature vectors are available drugtarget pair. Each feature vector z is derived from the d i = D(i, :); 1 ≤ i ≤ m with target t j = T(j, :); 1 ≤ j ≤ n , or z = (d, t) , and z ∈ R m+n . The next step is predicting the interaction between each given drug-target pair. We show both steps as follows. IEDTI architecture. This subsection provides the deep architecture of IEDTI. We describe it in three different modules as follows.
1. First module of Deep Neural Network The first module ( DNN 1 ) gets the d i = D(i, :), ∀i ∈ {1, · · · , m} as input and returns the corresponding embedding vector for each of them. As mentioned earlier, the similarity and dissimilarity among targets should also be kept among their corresponding embedding vectors. In other words, if two vectors are similar in the main space, their transformation should be similar in the embedding space. To keep similarities in the embedding space, we take advantage of the idea that Bordes et al. have introduced 35 . However, we have changed the objective function. Let's assume that for each d i , we can find the "set" of its similar vectors in D. We call it Smlr d i . On the other hand, each d i has dissimilarities or fewer similarities with the remaining vectors of D. Using these two sets of similar ones as well as dissimilar ones for each d i ; we compute its representation d i . Their formulation can be: Having this set and its complement set for each d i ∈ D , we define the below objective function: It is notable that the set Smlr d i is defined based on dist D and d , but L d is based on distD and d . The similar vectors should have a smaller distance, and the dissimilar vectors must have a longer distance. If the model works properly, L d must be close to zero. Thus, the objective of DNN 1 is to minimize the cost function L d .
The parameter γ is a margin hyperparameter for tuning the objective function. This function is called a triplet.
To do this, we can have several layers of neural networks. The number of input layer neurons must be equal to m (the length of d i ). It is also necessary for the number of neurons of the output layer to be equal to f 1 (the length of d i ). It is necessary to have meaningful embeddings. In other words, similar drugs must have similar representations in the embedding space. This aim requires defining a similarity among the original representation of drugs. To this end, we use the k-means algorithm and apply it to the drug vectors and define sets of similar drugs. Using this clustering, DNN 1 computes similar embeddings for the drugs of each set. As mentioned above, we applied the k-means method to put similar drugs (and similar proteins) in the same clusters. Then, we obtain a new representation using a semi-hard triplet loss function. This approach leads to having a shorter distance between every two members in a cluster and a wider gap between each pair of clusters. These clusters act as labels, and the loss function uses them to produce meaningful embeddings. Figure 4 shows t-SNE representations of drugs and targets before and after applying triplet. They show the power of k-means' representation as well as applying triplet embedding vectors. We chose the number of clusters in a way that the clusters have to be roughly equal. Thus, we examined 2 to 64 as the number of clusters for drugs, and 4 is the best possible number of drug clusters. Figure 4a illustrates drugs' k-means representations. Figure 4b is those drugs' separation in the embedding coordinate. Comparing two figures shows the discriminating power of the triplet. The same went for the targets; the best number of clusters was 5. Figure 4c shows the result of applying k-means on targets. Finally, Fig. 4d visualizes the final targets' embeddings.

Second module of Deep Neural Network
The second module ( DNN 2 ) works like its sibling DNN 1 . The difference is that while DNN 1 calculates embeddings of d i ∈ D, i ∈ {1, · · · , m} , DNN 2 calculates t j ∈ T, j ∈ {1, · · · , n} . For each t j , we define sets of similar vectors as well: Having the similarity set of each t i ∈ T and its corresponding complement, we define the below objective function: As we have mentioned for d , the distance between similar and dissimilar vectors must work the same for t as well. If the model works appropriately, L t must be close to zero, and the objective of DNN 2 is to minimize the cost function L t . For this aim, the first layer of DNN 2 should have n neurons, and the output layer of DNN 2 needs to have f 2 neurons. In harmony with the previous subsection, we apply the k-means algorithm to locate the set of similar targets.
∀i ∈ {1, · · · , m} : d i = D(i, :), ∀j ∈ {1, · · · , n} : t j = T(j, :) =⇒ z i,j = [d � t] (7) ∀i ∈ {1, · · · , m}, ∀j ∈ {1, · · · , n} : is equal to f 1 + f 2 . As mentioned above, the role of the third section is the calculation of the amount of interaction between ∀i ∈ {1, · · · , m} : d i ∈ D and ∀j ∈ {1, · · · , n} : t j ∈ T , or x ij . The output layer has one neuron, an approximation x ij . Formally, the objective of DNN 3 is Because d i and t j are acquired from DNN 1 and DNN 2 , we can rewrite the objective function as where shows the concatenations of two vectors. It is necessary to mention that all DNN 1 , DNN 2 , and DNN 3 can have several hidden layers. We discuss this more in the "implementation" and "discussion" sections. Figure 1 shows the general structure of the first proposed scenario. It is notable that IEDTI model is not an end-to-end model. Therefore, the error propagation is not an end-to-end process. and each module has its own error propagation.
DEDTI architecture. The deep network of the second scenario is similar to the first one. The only difference is in the input vector of the network. Its input vector is the concatenation of each d i and t j . Formally, or more precisely, it is The input layer's required neurons equal m + n , and the last layer contains a single neuron to predict each DTI. Implementation. In both described scenarios, we implemented ten-fold cross-validation to provide accurate information about our algorithm performance. To tune the parameters, we have tested the results with the suggestion from the previous studies on the subject of deep Learning and DTI prediction. The results show the parameters perform well in this work.
• DEDTI model Our first model takes the concatenation of ith protein and jth drug vector representations, c ij , as input. Therefore, the input shape is (2220, 1) as we have 708 drugs and 1512 targets. Then, it passes input, c ij , to four consecutive Conv1D layers with Relu activation function, where each is followed by batch normalization and dropout 0.5. Next, we use a dense layer after a flattened layer, followed by a dropout of 0.5. Finally, a dense layer with a sigmoid activation function predicts the interaction between the drug and protein. We compiled our model with Adam optimizer and Binary cross entropy loss function. The interaction is binary-valued. Zero shows no interaction, and one represents valid interaction. We also used the initial bias technique in our final dense layer to consider the imbalance dataset property. Our initial bias is as follows: In this model, we set the batch size to 1024 in the training phase. • IEDTI model Our prediction phase in the triplet model is the same as our first model. However, here we have two extra steps. First, we use k-means on drugs and proteins separately to find different clusters in them. Then we obtain new representations for them by using semi-hard triplet loss. Our new vector representation for drugs and proteins has a size equal to 256. After that, we feed their concatenations to our prediction phase, similar to our previous model. However, the input shape in this scenario is (512). As the input shape here is more petite than the previous model, we set our batch size to 64 for this one.
Performance evaluation metrics. We use ten-fold cross-validation to assess the performance of the models. We used different metrics such as AUC-ROC, AUPR, F1-score, and MCC to evaluate the methods. AUC-ROC is not proper for imbalance. Thus, we used the other evaluation metrics to cover the case of imbalanced data. We compute the sensitivity(recall), specificity, precision, and F1-score metrics based on the following equations. www.nature.com/scientificreports/ While F1-score is used for imbalanced data evaluation, we considered MCC due to its advantages in binary classification 36 . Its equation is as follows.
Complexity analysis. The parameter m shows the number of drugs, and the number of targets n represents the number of targets, the number of diseases is n di , and the number of side effects is n se . We assume there are e emb epochs necessary for the generation of secondary representations of drugs and targets, and each epoch time is equal to T e for both drug and target. For simplicity, we have assumed no difference in conversion time between the drug and the target. Lastly, we assume the number of epochs in the predictive model is equal to e p , and the time interval of each epoch is equal to T p . DEDTI and IEDTI need to compute the primary representation of each drug and each protein. Two similarity matrices for drugs are already ready. We need to compute two more similarity matrices for drugs using diseases and side effects necessary for the next two drug similarities. In the drug-disease matrix, the methods apply cosine similarity for each pair of drugs. Therefore, its time complexity is O(m 2 n di ) . The same happens for the drug-side effect matrix; thus, the complexity of its conversion is O(m 2 n se ) . Totally, the conversion for drugs is O(m 2 (n di + n se ) . Targets need one extra computation of similarity from diseases. similar to the drug-disease matrix, the complexity of computing similarity among the targets based on their common diseases is O(n 2 n di ) . In this paper, n is greater than m, and the complexity of the similarity computation is O(e emb ((m + n)T e )) , and m < n ; thus, it is O(e emb nT e ) . IEDTI computes embeddings of drugs and targets. These secondary representations have a time complexity of O(m 2 n di ).
Both models have a similar predictive module, and their complexity to evaluate all targets and all drugs is O e p mnT p . Their difference is in T p , which IDETI needs lower time and space complexity than the DEDTI.
It is notable that IEDTI with three DNN modules (two for embedding vectors' production and one module for prediction) contains all the steps of embedding preparation and prediction, while the state-of-the-art methods use the available embeddings (e.g., TransDTI) or have higher complexity (IMCHGAN).

Molecular docking analysis. Structure-based Molecular docking is a virtual alternative to costly and time-
consuming laboratory experiments to find the "best-fit" orientation of a drug to a particular target. Thus, we used this technique to rationalize the interaction potential between Chlorzoxazone-PTGS2 and Tetrabenazine-ADORA1 as two novel predicted drug-target pairs. To this end, crystal structures of ADORA1 (PDB 5n2s) and PTGS2 (PDB 3QMO) were obtained from the RCSB PDB protein data bank 37 . Also, the 3D-SDF structures of the tetrabenazine and chlorzoxazone were downloaded from the NCBI PubChem 38 . The native ligand, HEATM, and other solvent molecules in both protein structures were removed using discovery studio, and the steepest descent method was utilized for energy minimization. Then, the Swiss PDB Viewer (SPDBV) tool 39 was used to acquire the most stable conformation of proteins. Eventually, the final stages of protein preparation, including the addition of polar hydrogens and Kollman charges, were done using the Autodock tools (ADT). The preparation of ligands was performed by the addition of polar hydrogens and gasteiger charges. Also, root detection and choosing torsions from the torsion tree were done to rotate all the rotatable bonds. In order to determine the "active site" in the bonding position of ADORA1, the crystal structure of stabilized ADORA1in complex with PSB36 at 3.3A was visualized using the LIGPLOT+ tool 40

Results
Overview of DEDTI and IEDTI. In order to narrow the experimental space required to discover a novel therapeutic agent, this study proposes two innovative computational models called IEDTI and DEDTI. They can assist in identifying new DTIs by incorporating heterogeneous information on drugs and targets. IEDTI and DEDTI scenarios take advantage of the drug-target interactions as the prediction label. As an overview ( Figs. 1 and 2) represent IEDTI and DEDTI, respectively. Both models extract four types of similarities between drugs and three types of similarities for targets. Both scenarios manipulate the accumulative version of drugs and targets as their inputs. IEDTI consists of three CNN modules. The first and second modules generate the embedding vectors of drugs and targets, respectively. Thus, their inputs are feature vectors from the accumulation of similarity matrices, and their outputs are new embedding vectors. To have a meaningful generation of embeddings, a clustering method is applied to the accumulation matrices. The clustering helps to identify labels of drugs and targets. The DNN modules generate similar embedding vectors for inputs with the same label. The third module identifies the interaction of each drug-target pair. Thus, its input is the concatenation of new embedding vectors of drug-target pairs, and its output is a binary value that shows the existence or lack of any interaction. DEDTI, on the other hand, consists of just a single DNN module. The inputs of this module are directly accumulated similarity matrices of each under-examination drug-target pair, and its output is their interaction identifier. The "Methods" section describes both scenarios in detail.
Comparison of performance with other existing models. The prediction performance of our models was evaluated using a ten-fold cross-validation procedure. We divided the data set into the test and training sets, where 10% of the data set was utilized as the test set, and the remaining 90% was used as the training set. Then, we compared our results with the results of five state-of-the-art methods for DTI prediction, including HIDTI 22 and NeoDTI 20 , MolTrans 23 , TransDTI 24 , and IMCHGAN 41 . Also, due to data imbalance in positive vs. negative samples of DTI, we report the results with positive to negative ratios of 1:3, and 1:5, as common in the literature 22 . Tables 2 and 3 illustrate the results for these two sampling ratios, respectively. We compare the results based on AUC-ROC and AUPR, precision, recall, F1-score, and MCC. AUPR, F1-score, and MCC especially are insightful when there exists a ratio imbalance among positive and negative samples. IEDTI has a higher AUC-ROC in comparison with the HIDTI models and NeoDTI. The HIDTI-simple format has a higher AUPR in 1:3 and 1:5 ratios than IEDTI. However, the standard deviation of HIDTI models and NeoDTI is much higher than the IEDTI. In other words, IEDTI has lower fluctuations in seeing diverse folds. More importantly, as the table shows, DEDTI provides the best AUPR and AUC-ROC across all methods with minor fluctuations through all ratios and in both metrics. The results show that IEDTI and DEDETI, especially the latter, perform well in the prediction of DTIs. Figures 3a-f show the ROC and PR plots of IEDTI and DEDTI for all ratios 1:1, 1:3, and 1:5.
It is worth mentioning that the same happens for IEDTI and DEDTI methods for the ratio of 1:10.
External validation on gold-standard datasets. We apply the DEDTI, IMCHGAN, AutoDTI++, and IRNMF on gold-standard datasets 31 (Enzyme, Ion Channel, GPCR, and Nuclear Receptor datasets). Their AUC- DEDTI predicts novel interactions. Our model uses the information from accumulative similarities to predict the novel interactions among drugs and targets (Supplementary Data 1). We selected DTIs with a prediction score of not less than 0.9 as the top-ranked suggestions of DEDTI. Among the 126 top-ranked predictions (Fig. 6), we figured out that many of them are verifiable with scientific evidence from the literature. For instance, our prediction list shows the interaction between fentanyl and the D2 dopamine receptor (DRD2), and this prediction can be supported by previous studies 42 . However, among the list of top 126 predictions from DEDTI, there are some novel interactions with less attention in the literature. For example, two of these interactions are tetrabenazine-adenosine receptor A1 (ADORA1) and chlorzoxazone-prostaglandin-endoperoxide synthase 2 (PTGS2). Adenosine receptor A1 along with four other receptors are forming a defined subgroup of G protein-coupled receptors 43 . This protein is spread all over the human body and regulates renal function 44 . Moreover, recent studies show that the knockdown of ADORA1 in the human melanoma cell lines significantly suppresses cell proliferation, and this suppression leads to an antitumor effect 45 . Although, according to the KEGG database 46 , there are 25 approved drugs affecting ADORA1, the predicted drug by DEDTI (tetrabenazine) is not mentioned in this list. Tetrabenazine has been www.nature.com/scientificreports/ known as a dopamine-depleting agent developed for the treatment of schizophrenia. Additionally, many studies demonstrated this drug could be effective in the treatment of psychotic disorders and hyperkinetic movement disorders 47 . Prostaglandin-endoperoxide synthase 2 (PTGS2), also known as cyclooxygenase 2 (COX-2), is responsible for prostaglandin production and contributes to early pregnancy 48 . Furthermore, numerous studies have been reported on the role of PTGS2 in the pathogenesis of many diseases, such as inflammation, cardiovascular, gastrointestinal, and colorectal cancer 49 . Non-steroidal anti-inflammatory drugs (NSAIDs) are commonly used as an inhibitor for this enzyme 50 . Chlorzoxazone is an FDA-approved muscle relaxant, which was also predicted by DEDTI as a potential drug for interacting with PTGS2. In spite of the availability of approved drugs for these two above-mentioned targets, identifying a novel drug from existing approved drugs is always considerable. Therefore, it would be fascinating to check whether the predicted interactions between these two drugs and targets can be further validated.    Figure 7 shows its 3D and 2D representations. As Fig. 8 shows, the complex of tetrabenazine-ADORA1 is formed by an intermediate of a hydrogen interaction      www.nature.com/scientificreports/ on DTI with a negative sampling ratio of 1:1, DTI with a negative sampling ratio of 1:3, and all gold-standard datasets. In all cases, the statistical analysis was below the error level except the case of comparing DEDTI and IMCHGAN on the DTI dataset with a negative sampling ratio of 1:1. In other words, in all cases, DEDTI is significantly better than the other methods. The exception happens for the ratio 1:3, in which the DEDTI and IMCHGAN perform equally. Table 6 shows the results of the p-value.

Discussion and conclusion
We have introduced two methods, IEDTI and DEDTI, which both need the drug-target interactions not as input feature information but as labels for DTIs prediction. In other words, our methods are inductive, which contrasts with NeoDTI 20 . NeoDTI uses drug-target information in feature space, which is quite common in graph neural network methods. More importantly, both train and test samples are visible in the method's training phase, which makes this method transductive. Transductive methods are not suitable for prediction. IEDTI and DEDTI utilize DNN modules for their missions. the former uses three modules (two for the production of embeddings and one for prediction and the latter uses one module (the prediction module). besides the number of modules, both of them have a lower computational complexity in comparison to state-of-the-art methods, e.g., HIDTI, NeoDTI, and IMCHGAN. Additionally, IEDTI acquires meaningful embeddings directly instead of using available and ready-to-use embeddings.
On the other hand, IEDTI, like methods from the literature such as NeoDTI and HIDTI, takes advantage of transforming the original feature space to a new corresponding embedding space. It aims to have a meaningful representation of data and a lower computational overhead for the prediction. We show this in the complexity analysis in the Method section. However, such transformations depend on the conversion method and the labeled data. In many cases, data clustering does not return a suitable value. DEDTI presents that more straightforward methods without the extra overhead of embedding conversion outperform better in DTI prediction. It is necessary to have better methods for embedding conversions.
Moreover, methods need to be inductive to be capable of predicting DTIs. Based on Occam's razor, the more straightforward method is the best choice for the data. Again DEDTI gives an insightful representation of this idea. Information for DTI, i.e., drug-target interactions, drug-drug interaction, drug-drug similarity, drug-side effect associations, drug-disease associations, target-target interactions, target-disease interactions, similarities of targets. Another important observation from this work is the advantages of summing up similar matrices instead of concatenating them. Converting the information matrices to the similarity matrices makes their dimension equal, and this conversion provides the capability of summing the information.
The summation of similairty matrices has a smaller feature space than the concatenation. For example, each drug vector has a size of 708 compared to other methods with feature vector's length greater than thousands. In addition, the concise feature space avoids the sparse representation of the feature vectors. In other words, each drug sample has a denser representation, making them more meaningful.
The denser representation is another reason why DEDTI has the best performance across all methods. Notably, in addition to the deep prediction network, DEDTI includes the summed-up similarity vectors as the feature representation of both the drug and target. Improving the way of feature embeddings and ameliorating the inductive, predictive method are elixirs of DTI prediction.

Data availability
The datasets generated and/or analyzed during the current study are available in the IEDTI-DEDTI repository, github. com/ Bioin forma ticsI ASBS/ IEDTI-DEDTI.